Review of the Geant4-DNA Simulation Toolkit for Radiobiological Applications at the Cellular and DNA Level

Simple Summary A brief description of the methodologies to simulate ionizing radiation transport in biologically relevant matter is presented. Emphasis is given to the physical, chemical, and biological models of Geant4-DNA that enable mechanistic radiobiological modeling at the cellular and DNA level, important to improve the efficacy of existing and novel radiotherapeutic modalities for the treatment of cancer. Abstract The Geant4-DNA low energy extension of the Geant4 Monte Carlo (MC) toolkit is a continuously evolving MC simulation code permitting mechanistic studies of cellular radiobiological effects. Geant4-DNA considers the physical, chemical, and biological stages of the action of ionizing radiation (in the form of x- and γ-ray photons, electrons and β±-rays, hadrons, α-particles, and a set of heavier ions) in living cells towards a variety of applications ranging from predicting radiotherapy outcomes to radiation protection both on earth and in space. In this work, we provide a brief, yet concise, overview of the progress that has been achieved so far concerning the different physical, physicochemical, chemical, and biological models implemented into Geant4-DNA, highlighting the latest developments. Specifically, the “dnadamage1” and “molecularDNA” applications which enable, for the first time within an open-source platform, quantitative predictions of early DNA damage in terms of single-strand-breaks (SSBs), double-strand-breaks (DSBs), and more complex clustered lesions for different DNA structures ranging from the nucleotide level to the entire genome. These developments are critically presented and discussed along with key benchmarking results. The Geant4-DNA toolkit, through its different set of models and functionalities, offers unique capabilities for elucidating the problem of radiation quality or the relative biological effectiveness (RBE) of different ionizing radiations which underlines nearly the whole spectrum of radiotherapeutic modalities, from external high-energy hadron beams to internal low-energy gamma and beta emitters that are used in brachytherapy sources and radiopharmaceuticals, respectively.


The Geant4-DNA Extension
Since 2007, Geant4 (release 9.1) is the only open access general-purpose radiation transport MC code offering, through its Geant4-DNA low-energy extension, track-structure capabilities in liquid water down to the eV energy range [22]. Liquid water has been historically the medium of choice in track-structure codes because of its abundance in cells (70-80% by weight) and also because of its role as a source of reactive free radicals [48]. Towards more realistic modelling of direct damage to DNA, several studies have presented interaction cross sections that are specific to DNA bases (or constituents) in the gas phase [49][50][51][52][53][54] some of them being part of the Geant4-DNA ongoing developments. Interaction cross sections that are specific to bulk DNA in the condensed-phase have also been presented based on the dielectric approach [55][56][57].
Geant4-DNA offers the possibility to transport interaction-by-interaction electrons, protons, hydrogen atoms, alphas, and some ions in liquid water medium. Due to the importance of low-and moderate-energy electrons (from few eV to 1 MeV) in track-structure simulations, as well as the large uncertainties that are associated with their transport in biological media, users have the possibility to select among three recommended sets of alternative physics models which correspond to different cross sections for elastic and inelastic scattering. These physics models are the default Geant4-DNA models or Option 2 constructor (available since 2007 in Geant4 version 9.1) [24], the improved models that were developed at the University of Ioannina [58] or Option 4 constructor (available since 2015 in Geant4 version 10.2), and the Option 6 constructor (available since 2017 in Geant4 version 10.4) [14,[22][23][24]. The Geant4-DNA models have been tested and validated against reference data (i.e., NIST, ICRU) and wherever available versus experimental data and other MC simulation studies. Comparisons between the condensed history models of Geant4 and the track-structure models of Geant4-DNA have also been undertaken for particular applications [14,23,[59][60][61][62][63].

Physical Interactions in Geant4-DNA: Energy-Loss Models
Energy-loss models in track-structure simulations determine the spatial distribution of ionizations and excitations, thus, the energy deposition processes taking place within the target volumes. Measured ionization and excitation cross section data for liquid water do not exist (contrary to the case of vapor water). Therefore, theoretical models play an important role in the development of energy-loss models for liquid water and a variety of approaches have been adopted, ranging from some well-established atomic/molecular models [64] to more complicated solid-state models [65,66]. However, the existence of spectroscopic data for several materials (including liquid water) concerning their absorption spectrum or their optical dielectric constants provides an indirect means to calculate inelastic scattering cross sections from first principles using the dielectric theory. Different analytic parameterizations of the dielectric response function [67,68] are currently being used in track-structure codes, such as the NOREC, PARTRAC, KURBUC, and GEANT4-DNA codes. In the framework of the plain wave Born approximation (PWBA), the dielectric response function completely determines inelastic scattering through the proportionality: where E is the energy transfer, q is the momentum transfer (or scattering angle), σ is the cross section, and Im[−1/ε] = Im[ε]/|ε| 2 is the energy-loss function (ELF). Due to its analytic properties, a Drude-like model of ε(E,q) is being used in all the above track-structure codes (NOREC, PARTRAC, KURBUC, GEANT4-DNA), which mostly differ on the values of the Drude coefficients and the details of the parameterization of experimental data. The advantage of a Drude representation of ε(E,q) against using more sophisticated models (e.g., Mermin) has been discussed elsewhere [66]. In Geant4-DNA Option 2 constructor, the experimental data for the imaginary part of the dielectric response function of liquid water [69] are partitioned to the outer ionization shells and excitation levels using the following parameterization [58]: where the index n runs over the ionization shells and the index k runs over the discrete excitation levels, D n (E;E n ) and D k *(E;E k ) are the ordinary and derivative Drude functions with coefficients that were determined by a fit to the experimental data at the optical limit (q = 0) under the constraint of the sum-rules, and B n,k are threshold energies (e.g., binding energies). In Geant4-DNA Option 4 constructor [58], Equation (2) is replaced by where F n,k (E) are analytic functions that are calculated by the new algorithm that is implemented in Option 4 that improve the consistency of the model ELF and facilitate the analytic calculation of the real part, Re[ε], via the Kramers-Kronig relation [58]. Despite using the same experimental optical data set with Option 2 as input, as well as the same ionization shells and excitation levels, substantially different ionization and excitation cross sections are obtained in Option 4 at low energies due to the implementation of the Emfietzoglou-Kyriakou partitioning algorithm [58]. This resulted in higher ion-pair energies (the so-called "W-values"), smaller penetration distances, and less diffused dose-pointkernels [58,70] while also influencing the calculated yields of direct DNA damage [71,72].
In addition, Option 4 includes some methodological changes for a more consistent implementation of the Coulomb and Mott corrections which quantitatively account for most of the exchange-correlation effect that is encountered in electron-electron interactions [73,74]. These corrections are applied to address more accurately the low energy electron inelastic cross sections which have a strong influence on modeling radiobiological effect sat the sub-cellular level.
An alternative to Option 2 and Option 4 constructors is the Option 6 constructor which corresponds to a re-engineering of the CPA100 track-structure code [75]. The original CPA100 code, which was a well-established code in microdosimetry, was developed and maintained by Michael Terrissol and co-workers [17] to simulate the step-by-step transport of electrons and photons in liquid water and different biomolecular targets, such as DNA bases and sugar-phosphate groups [76] in homogeneous as well as heterogeneous materials. CPA100 code can generate all the electronic and photonic cascades (Auger electrons, X-rays, and atomic reorganization). It can also simulate physicochemical and chemical stages during the early passage of particles in matter up to one microsecond to evaluate early DNA damage. In the last version of CPA100 code, the energy differential and the total cross sections of the five molecular orbitals are calculated using the BEB (Binary Encounter Bethe) model [64]. The BEB model is an exchange-corrected atomic model which requires only the knowledge of three physical parameters for each orbital: the binding energy, the mean kinetic energy, and the orbital occupation number. The first advantage of this model is that the cross sections are calculated analytically. The second advantage of the analytical form of the differential cross section is that the energy loss can be obtained by directly sampling the differential cross section expression without using interpolation in very large cross section tables. This is done via a composition sampling method that was originally developed in the last version of CPA100 [77]. The excitation cross sections for the five discrete levels are calculated in the first Born approximation using the optical data model of the complex dielectric response function coming from the work of Dingfelder et al. [78]. This model is also based on a Drude representation of ε(E,q) using the same optical dataset, electronic excitation levels, and dispersion relations that are similar to the other available electronic models in the Geant4-DNA constructors (Option 2 and Option 4). However, the resulting excitation cross sections are not the same due to a different set of Drude coefficients. Finally, in the Option 6 set of physics models, angular deflection in inelastic scattering for both excitation and ionization is considered based on the kinematics of binary collisions.

The Physico-Chemical and Chemical Stages of Water Radiolysis
After ionization or excitation in an energy deposition process, water molecules can dissociate or decay into new reactant species (e − aq , H 2 , H • , • OH, H 3 O + , . . . ). These new molecular species can diffuse and interact amongst themselves, producing other molecules. Consequently, after the physical interactions, the number of molecules of a given species evolves in time. During this process, free radicals are created and may react with biological molecules such as DNA, RNA, proteins, etc., and finally, induce early damage. In Geant4-DNA, since version 10.1, the physico-chemical and chemical stages of water radiolysis were introduced [79] that allow simulations of the production, chemical reactions, and transport of reactive species along the passage of radiation up to 1 µs. These two stages, which technically belong to the chemistry module of Geant4-DNA, are described below in more detail.

The Physico-Chemical Stage
In Geant4-DNA, the physico-chemical stage includes the thermalization process of secondary electrons and electronic events that are occurring in ionized and excited water molecules up to 1 picosecond (ps). The primary or secondary electrons that are produced during the physical stage continue to lose their energy through the dissociative attachment and vibrational excitation processes, get thermalized within 110 femtoseconds (fs), and then become solvated within about 250 fs [11,80,81]. The electronic holes of ionized water molecules quickly join proton transfer processes that are occurring between them or a nearby water molecule to create H 3 O + and • OH molecules or participate with solvated electrons in the recombination process which reforms water molecules. The excited water molecules go through dissociation channels which depend on their excited level. The hot dissociation fragments of water molecules and electrons are assumed to become all  Table 2 presents the decay channels that are implemented in the default physico-chemical stage of Geant4-DNA [80].
where r 0 is the initial position and r is the possible next position of free species for the probability p(r, ∆t|r 0 ) in a time interval ∆t. This probability depends on the diffusion coefficient D in water for each species type. Equation (4) means that the determination of the species positions in Brownian motion is randomly based on a 3D-Gaussian distribution and can be applied only in a space and a time interval ∆t where species are free. Figure 1 shows the distributions of displacements that were obtained by a sampling method in Geant4-DNA at 1 µs, compared with the distribution that was calculated by Equation (4). The Geant4-DNA sampling method uses a generator of random numbers (R x , R y , R z ) which are normally distributed with zero mean and unit variance to compute the position of the molecule at time t + ∆t as: the position of the molecule at time + ∆ as: In the diffusion model that was described earlier, only the molecules of interest are explicitly simulated and the solvent (water) is considered as a continuum. Under these conditions, free species diffuse until two reactants encounter each other. Let's consider two reactants A and B in encounter, they will first form a complex (A:B) that might either re-dissociate into A and B or react and create new products P. The chemical equation is: where is the reaction rate constant of the (A:B) complex formation, is the dissociation rate constant of the (A:B) complex, and the activation reaction rate constant. If the reaction is very effective, the reactants will form the new products as soon as the molecules encounter each other. This is described by a very high activation reaction rate constant ≫ , (in other words, →∞). This allows us to define the reaction rate as: In the diffusion model that was described earlier, only the molecules of interest are explicitly simulated and the solvent (water) is considered as a continuum. Under these conditions, free species diffuse until two reactants encounter each other. Let's consider two reactants A and B in encounter, they will first form a complex (A:B) that might either re-dissociate into A and B or react and create new products P. The chemical equation is: where k C is the reaction rate constant of the (A:B) complex formation, k D is the dissociation rate constant of the (A:B) complex, and k R the activation reaction rate constant. If the reaction is very effective, the reactants will form the new products as soon as the molecules encounter each other. This is described by a very high activation reaction rate constant k R k C ,k D (in other words, k R →∞). This allows us to define the reaction rate k as: These reactions are named diffusion-controlled reactions and are considered to be very effective and occurring immediately when molecules encounter each other. The chosen criterion is usually the separation distance between the reactants; when two reactants are under a certain threshold, the reaction occurs. This threshold is calculated from the reaction rates by the Smoluchowski theory in case of particles without charge: and by the Smoluchowski-Debye theory for electro-static interactions: where with D A and D B are the diffusion coefficients of reactants A and B in water, and N A is the Avogadro number. Table 3 provides the implemented chemical reactions and corresponding reaction rates of Geant4-DNA (from Geant4 version 10.7). Table 3. Implemented chemical reactions and reaction rate constants k [82,83] as defined in the two chemistry constructors "G4EmDNAChemistry" and "G4EmDNAChemistry_option1" that apply at ambient temperature (25 • C) of Geant4-DNA version 10.7.

G4EmDNAChemistry
G4EmDNAChemistry_ Option1 To avoid losing potential reactions, time intervals of the species in diffusion are usually defined by discretized small steps (or time step) [84]. However, to maintain the computational efficiency of the model, the time steps should not be too small [80]. Therefore, to optimize the computation time, improved time step models are necessary.
In Geant4-DNA, since version 10.1, the Step by Step (SBS) approach which uses the dynamic time step model, has been implemented. A detailed description of this method can be found elsewhere [79,84]. This method briefly proposes a time step model which allows the definition of virtual time steps during which the reaction cannot occur with at least 95% (default value) of confidence. One can visualize that as creating a protection domain surrounding the particle, ensuring that this particle will not react with any other particle with 95% confidence up to its border. Therefore, within this protection domain, the particle is considered approximately independent (or "free") and can take longer diffusion time steps. Equation (4) is used to determine the dynamic time step. This process is repeated many times until a chemical reaction takes place. Thus, we may have one-or many-time steps before the reaction occurs. To avoid the scenario of too many small time steps, the Minimum Time Steps and the Brownian bridge approaches have been added to limit the number of time steps to an encounter. While Minimum Time Steps constrain the minimum time-step that is allowed for each reactant pair, the Brownian bridge technique computes the probability of encounter during their Minimum Time Steps and thus compensates for the "missed" reactions. The use of these Minimum Time Steps in Geant4-DNA can be decided by users. Figure 2 shows the time-dependent radiolytic yields G-values using the Minimum Time Step at 1 ps given by the two chemistry constructors "G4EmDNAChemistry" and "G4EmDNAChemistry_option1" of Geant4-DNA. While the "G4EmDNAChemistry" constructor, that is based on the PARTRAC MCTS code [36] was available in Geant4 since version 10.1, the "G4EmDNAChemistry_option1" including diffusion coefficients and chemical reaction rates from the RE-/RITRACKS MCTS code [40] was integrated since version 10.5. Both chemistry constructors currently use the SBS approach. The radiolytic yield G-value is defined as the number of molecules at a given time for each 100 eV of deposited energy. In this simulation, Geant4 version 10.7 is used with 80 keV incident electrons irradiated from the center of a very large water volume (see Geant4-DNA example "chem5") and the minimum and maximum energy depositions set at 1 keV et 2 keV, respectively [11]. A general agreement of these results with experimental data can be observed. It should be noted that the last two parameters are used for energy threshold selection to restrict such energy deposition in a small segment of the entire physical track.
While the minimum energy deposition limits the energy loss of the primary particle, the event having total energy deposition larger than the maximum energy deposition will be aborted and is fully ignored in the simulation. deposited energy. In this simulation, Geant4 version 10.7 is used with 80 keV incident electrons irradiated from the center of a very large water volume (see Geant4-DNA example "chem5") and the minimum and maximum energy depositions set at 1 keV et 2 keV, respectively [11]. A general agreement of these results with experimental data can be observed. It should be noted that the last two parameters are used for energy threshold selection to restrict such energy deposition in a small segment of the entire physical track. While the minimum energy deposition limits the energy loss of the primary particle, the event having total energy deposition larger than the maximum energy deposition will be aborted and is fully ignored in the simulation.  deposited energy. In this simulation, Geant4 version 10.7 is used with 80 keV incident electrons irradiated from the center of a very large water volume (see Geant4-DNA example "chem5") and the minimum and maximum energy depositions set at 1 keV et 2 keV, respectively [11]. A general agreement of these results with experimental data can be observed. It should be noted that the last two parameters are used for energy threshold selection to restrict such energy deposition in a small segment of the entire physical track. While the minimum energy deposition limits the energy loss of the primary particle, the event having total energy deposition larger than the maximum energy deposition will be aborted and is fully ignored in the simulation.  version 10.5. Both chemistry constructors currently use the SBS approach. The radiolytic yield G-value is defined as the number of molecules at a given time for each 100 eV of deposited energy. In this simulation, Geant4 version 10.7 is used with 80 keV incident electrons irradiated from the center of a very large water volume (see Geant4-DNA example "chem5") and the minimum and maximum energy depositions set at 1 keV et 2 keV, respectively [11]. A general agreement of these results with experimental data can be observed. It should be noted that the last two parameters are used for energy threshold selection to restrict such energy deposition in a small segment of the entire physical track. While the minimum energy deposition limits the energy loss of the primary particle, the event having total energy deposition larger than the maximum energy deposition will be aborted and is fully ignored in the simulation.

Independent Reaction Time
Although the dynamic time step model is used to optimize the choice of the time step, the calculation time remains the main drawback of the SBS approach when the simulations deal with a large number of species [101,102]. To avoid this drawback, a more efficient method called Independent Reaction Time (IRT) has been implemented in Geant4-DNA in the version 10.7 of Geant4. By simplifying the multiple particle problem to the two-particle problem in an approximation, the IRT method proposes that possible reactant pairs are scheduled independently and organized in a reaction queue sorted by their time to reaction (or "reaction time"). The reactions having the shortest reaction time are then processed successively until no more reaction is in the reaction queue, or the reaction times are longer than the end time of the simulation.
The reaction times are sampled according to the reaction probability, which is calculated by the solution of the Smoluchowski equation (4) with boundary conditions (for totally or partially diffusion-controlled reactions) transformed to radial Green's function in a spherical coordinate system. The details of the IRT are found elsewhere in [100,[103][104][105].
This approach allows the performance of water radiolysis simulation with reasonable computational time. Such time depends strongly on the linear energy transfer (LET) of the irradiation source as shown in Figure 3. The IRT model can reduce the simulation time up to 1000 times compared to the SBS method for low-LET simulations as, for instance 1 MeV electrons. Since the search algorithm of the IRT method is less efficient at the high density of chemical species, the ratio of the calculation efficiency between SBS and IRT decreases as a function of LET. Nevertheless, the simulation time for the chemical stage is remarkably reduced using the IRT model.

History
It is recognized that DNA is the privileged target to be taken into account to understand and predict the consequences of irradiation at the cellular level [107]. DNA damages are categorized as single-strand breaks (SSB), DSBs, base damage, and complex damage when a combination of those is produced. Among the different types of DNA damage, double strand breaks (DSBs) are considered the most deleterious and can be correlated with clonogenic cell kill [108]. The more complex the initial damage to DNA, the higher the probability of being mis-repaired and thus consequences on cell fate can be expected. Therefore, modelling of early damage to DNA represents a powerful tool to predict irradiation risk [109]. This is only possible if a detailed understanding of the mechanisms of ionizing radiation action on living organisms is implemented in the code.
To this end, MC track-structure codes have been developed to describe molecular interactions leading to direct and indirect effects which are at the origin of early DNA damage. The pioneering works have evolved from the consideration of direct effects alone [110,111] to the inclusion of indirect damage [112][113][114]. A key point in these simulations is the geometrical description of the target molecule, i.e., the DNA structure. Indeed, a good precision in the volume of its constituents and their relative position has a direct impact on the number and complexity of damage that is computed from both the physical

History
It is recognized that DNA is the privileged target to be taken into account to understand and predict the consequences of irradiation at the cellular level [107]. DNA damages are categorized as single-strand breaks (SSB), DSBs, base damage, and complex damage when a combination of those is produced. Among the different types of DNA damage, double strand breaks (DSBs) are considered the most deleterious and can be correlated with clonogenic cell kill [108]. The more complex the initial damage to DNA, the higher the probability of being mis-repaired and thus consequences on cell fate can be expected. Therefore, modelling of early damage to DNA represents a powerful tool to predict irradiation risk [109]. This is only possible if a detailed understanding of the mechanisms of ionizing radiation action on living organisms is implemented in the code.
To this end, MC track-structure codes have been developed to describe molecular interactions leading to direct and indirect effects which are at the origin of early DNA damage. The pioneering works have evolved from the consideration of direct effects alone [110,111] to the inclusion of indirect damage [112][113][114]. A key point in these simulations is the geometrical description of the target molecule, i.e., the DNA structure. Indeed, a good precision in the volume of its constituents and their relative position has a direct impact on the number and complexity of damage that is computed from both the physical and chemical stage. This structure has to be defined as precise as possible at the nanometric scale (nucleotide scale) as well as at higher scale (micrometric) where the DNA structure is also meaningful (chromatin fiber, domains, chromatin compaction depending on the cell cycle, etc). Historically, the DNA geometrical models that were implemented range from simplified representations that were based on cylinders [115] to highly complex descriptions of the whole genome [9,116,117], reaching, in some cases, an atomistic definition of the DNA components [118,119]. In general, inelastic collisions are assumed to cause strand breaks depending on the amount of energy that is deposited in the sensitive sites, including sugar phosphate backbone and DNA hydration shell. This amount of energy is, to a certain extent, a modifiable parameter that is specific to each code. The physicochemical and chemical stages are also crucial since it should be noted that around 70% of the damage from low LET particles is due to indirect effects. It is, therefore, generally considered that only a certain fraction of the simulated hydroxyl radical interactions with the deoxyribose moiety cause DNA strand breakage. Moreover, it is generally accepted to limit the duration of the simulation of the chemical step which makes it possible to account for the scavenging effect of the cellular environment. From this perspective, the results of several MC track-structure codes that were developed for modelling cellular radiation response have been historically (and successfully) compared to experimental data mainly in terms of DSB yields or chromatin fragments [35,39,114].

"FullSim" Complete Simulation Chain for DSBs Calculations
In 2017, Geant4-DNA was used for the first time for simulating the combination of physical, physico-chemical, and chemical stages for the assessment of early radiation damage in terms of DSB yield and complexity at the scale of an entire human genome (fibroblast cell) [116]. One of the key elements of the simulation is the sampling of energy deposition and radical reactions leading to damage in a DNA geometry which must, therefore, be as realistic as possible. The DNA models that are used within this simulation chain are built with the DnaFabric software [120]. This independent software was designed to facilitate the generation of DNA geometries at different scales ranging from the nucleotide pair to the entire genome of any cell type [121] and respecting the different compaction levels. Once the geometrical model is generated, a text file is written that can be read by the DetectorConstruction class of the Geant4-DNA simulation chain.
With DnaFabric, B-DNA, which is the most common form of DNA, is modelled by representing each nucleotide constituent (2-deoxyribose, phosphate, and DNA bases) of the genome with spherical shapes and twisted to generate nucleosomes that are linked together to create the chromatin fiber as shown in Figure 4. The continuity of the fiber within each chromosome is ensured by using voxels describing the 30 nm chromatin fiber taking different directions that can be connected to fill the space. From the biological point of view, different DNA densities and distributions are observed at the cellular (microscopic) level depending on the cell type and the cell cycle. Furthermore, chromatin compaction (nanometric level) has a role in the protection of DNA from damage induction and it was shown that DSB production occurs more frequently in decondensed chromatin [122,123]. The condensed form of the chromatin fiber is the heterochromatin whereas the decondensed form is called euchromatin.  Another key element in the simulation chain is the definition of parameters, as shown Table 4, that allow the translation of the simulated information (position of energy deposits and reaction of radicals in the DNA geometry) into strand breaks that are then converted into DSB data. The set of parameters that are used by default to compute the number of DSBs includes: (i) a threshold for energy that is deposited in the backbone (including hydration shell) Elower = Ehigher = 17.5 eV to induce a direct strand break, (ii) an effective rate POH = 40% in SSB production from the reaction of a hydroxyl radical with deoxyribose to produce an indirect strand break, (iii) a duration of the chemical step fixed to Tchem =2.5 ns (use of chemistry module of Gean4.10.1) to prevent radicals far from the DNA structure from being at the origin of indirect effects, and (iv) a DSB is scored when at least two SSBs that are located on opposite strands are separated by less than 10 bp.  Heterochromatin fiber voxel models were first developed in DNAFabric [116] and then complemented by euchromatin models [124]. The combination of the two types of compaction allowed for the accounting of the distribution of heterochromatin and euchromatin regions that were measured experimentally to study its influence on the DSB yield that was simulated for proton irradiations. Results have shown the DSB yield increasing when geometrical models that are closer to the observation are taken into account [124] thus verifying the protective role of heterochromatin which is reproducible in the calculation chain.
Another key element in the simulation chain is the definition of parameters, as shown Table 4, that allow the translation of the simulated information (position of energy deposits and reaction of radicals in the DNA geometry) into strand breaks that are then converted into DSB data. The set of parameters that are used by default to compute the number of DSBs includes: (i) a threshold for energy that is deposited in the backbone (including hydration shell) E lower = E higher = 17.5 eV to induce a direct strand break, (ii) an effective rate P OH = 40% in SSB production from the reaction of a hydroxyl radical with deoxyribose to produce an indirect strand break, (iii) a duration of the chemical step fixed to T chem = 2.5 ns (use of chemistry module of Gean4.10.1) to prevent radicals far from the DNA structure from being at the origin of indirect effects, and (iv) a DSB is scored when at least two SSBs that are located on opposite strands are separated by less than 10 bp. The comparisons between the simulated results and the experimental data have been carried out for both high and low LET particles.
DSB yields were first calculated in a fibroblast cell nucleus that was filled with only heterochromatin and compared to the literature for proton irradiations [116]. The good agreement that was found ( Figure 5) after fixing the parameters that are presented above, showed that the whole simulation chain was able to reproduce the trend over LET of experimental data, taking into account the measurement uncertainties correctly. A good agreement was also found in a second publication comparing the simulated DSB yield that was produced by X-rays of different spectra (40 kVp, 220 kVp and 4 MV) with the experimental measurements of γ-H2AX for HUVECs cells that were irradiated in the simulated facilities as shown Table 5 [125]. The simulation setup included the nucleus geometry combining heterochromatin and heterochromatin regions that were distributed randomly but respecting the global rates that were observed experimentally. Furthermore, simulations have shown that the differences between different beam qualities were due to the proportion of secondary electrons of energy below 10 keV, highlighting the crucial role of these particles. Since the simulation chain does not take repair processes into account, it is essential to compare simulations and experiments at the maximum gamma-H2AX signal, as suggested in [125], to minimize the underestimation of early DNA damage. Nevertheless, considering the rather slow repair of DSBs, a better match is expected in the case of low LET irradiations because the correspondence 1:1 DSB/foci is more reliable. Indeed, for high LET irradiations, the measured foci may be sites containing clustered DSBs. role of these particles. Since the simulation chain does not take repair processes into account, it is essential to compare simulations and experiments at the maximum gamma-H2AX signal, as suggested in [125], to minimize the underestimation of early DNA damage. Nevertheless, considering the rather slow repair of DSBs, a better match is expected in the case of low LET irradiations because the correspondence 1:1 DSB/foci is more reliable. Indeed, for high LET irradiations, the measured foci may be sites containing clustered DSBs.   Table 5. Comparison of the simulated results and the experimental for a dose of 1 Gy for different X-ray beams: mean number of γ-H2AX foci per endothelial cell nucleus in Gap0/Gap1 (30 min post-irradiation) and mean number of simulated DSB per nucleus, reproduced from [125].

Review of "MolecularDNA"Application
An application named "molecularDNA" was developed for the mechanistic modeling of radiation track-structure effects and to test their ability to model the induction of cellular damage following irradiation, including, for the first time, the IRT approach. This application linked the physical, chemical (IRT), and geometrical interfaces of Geant4 to investigate early DNA damage that was induced by radiation and the subsequent biological response. The development and main principles of the application are described in [126], followed by a series of publications [71,106,117,127,128] and is designed for general purpose investigation of how DNA is damaged in a variety of biological geometry models. It contains a library of geometrical models of sub-cellular component units such as simple DNA fibres, chromatin fibres, and fractal geometries that can combine smaller segments to build a longer chain. The molecularDNA application uses the damage classification system that was proposed by Nikjoo [129], which allows the calculation of the number of initial DNA damage events for a given simulation. The classification system considers the source of the damage (direct and indirect damage induction) and the complexity of the damage, including simple SSBs, DSBs, and complex clustered damage. By using the scored initial DNA damage and the classified damage detail as inputs, the application allows calculation of the protein yield that is involved from the four major repair processes in normal human fiblobrasts.
The application was validated through a comparative investigation of Geant4-DNA physics, chemistry, and DNA damage models against some well-established MC simulation platforms for radiobiological applications [117], namely KURBUC and PARTRAC. For the assessment, a 3 µm radius liquid water sphere was filled with 200,000 randomly-distributed, non-overlapping, individual DNA geometries. Each DNA geometry was a 216 base-pair (bp) straight DNA segment in a rectangular placement volume. Primary electrons (energy range of 300 eV to 4.5 keV) were generated randomly with a random direction in a smaller 500 nm radius sphere in the centre of the test region. In a follow-up study, the application was extended to allow the simulation of initial DNA damage in an Escherichia coli (E. coli) cell using a combination of straight and turned DNA segments [71] that were joined together to mimic a fractal pattern. Applying a mask to this otherwise rectangular pattern allowed simulation of the E. coli ellipsoidal geometry as shown in the left panel of Figure 6 (a long semi-major axis of 950 µm and two short semi-major axes of 400 µm) containing 4.63 Mbp. The simulated results are validated against experimental data for plasmid that was irradiated by both electrons (10 keV) and protons , as well as against past simulations. Through the above two studies, the damage models and parameters were tuned to be consistent with experimental data and the simulated result of KURBUC. For direct damage, both a threshold model (where breaks occur when energy depositions are E lower = E higher = 17.5 eV or higher, proposed by KURBUC [129]) and proportional damage model (where breaks occur with a linearly increasing likelihood from E lower = 5 to E higher = 37.5 eV, proposed by PARTRAC [36]) were tested for different energy deposition ranges. The probability of indirect damage (P OH ) was also tested within the ranges for which it had been experimentally measured (0.4-0.8) [130,131]. As a result of this tuning, the Geant4-DNA simulations that were reproduced the results of KURBUC with similar parameters used in KURBUC as shown in right panel of Figure 6.   [131] to build a human cell nucleus that was composed of fractally distributed chromatin fibres as shown in Figure 7. In this newly developed cell nucleus model, the DNA fibre is folded by histones (which were defined as spheres, each with a 25 Å radius) compactly forming chromatin fibre. In the study by Sakata et al. [131], the damage parameters were re-adjusted within a reasonable range to achieve agreement with experimental data for proton irradiation that was induced SSB and DSB yields in a human cell. The geometrical model was further improved in the study of Sakata et al. (2019) [127] to build a human cell nucleus that was composed of fractally distributed chromatin fibres as shown in Figure 7. In this newly developed cell nucleus model, the DNA fibre is folded by histones (which were defined as spheres, each with a 25 Å radius) compactly forming chromatin fibre. In the study by Sakata et al. [127], the damage parameters were re-adjusted within a reasonable range to achieve agreement with experimental data for proton irradiation that was induced SSB and DSB yields in a human cell.
The geometrical model was further improved in the study of Sakata et al. (2019) [131] to build a human cell nucleus that was composed of fractally distributed chromatin fibres as shown in Figure 7. In this newly developed cell nucleus model, the DNA fibre is folded by histones (which were defined as spheres, each with a 25 Å radius) compactly forming chromatin fibre. In the study by Sakata et al. [131], the damage parameters were re-adjusted within a reasonable range to achieve agreement with experimental data for proton irradiation that was induced SSB and DSB yields in a human cell. Figure 7. The structure of cell nucleus and its sub-biological components for the molecularDNA application [127].
In reality, a cell nucleus is confined to a cytoplasm and irradiation events involve the surrounding experimental equipment. Thus, in the follow-up study by Sakata et al. [127], the geometrical model was upgraded with an ellipsoidal water absorber to mimic the cytoplasm and a rectangular absorber to imitate the window of a cell culture flask (shown in the left-top panel of Figure 8). Additionally, the model parameters were slightly adjusted, and a prediction model that was given by Belov et al. [132] that calculates kinetics Figure 7. The structure of cell nucleus and its sub-biological components for the molecularDNA application [128].
In reality, a cell nucleus is confined to a cytoplasm and irradiation events involve the surrounding experimental equipment. Thus, in the follow-up study by Sakata et al. [128], the geometrical model was upgraded with an ellipsoidal water absorber to mimic the cytoplasm and a rectangular absorber to imitate the window of a cell culture flask (shown in the left-top panel of Figure 8). Additionally, the model parameters were slightly adjusted, and a prediction model that was given by Belov et al. [132] that calculates kinetics of proteins that are involved in four major repair pathways for normal human fiblobrasts was integrated into the application. The fully integrated application was validated with both the experimental DSB yields (as Geant4-DNA 2020 in Figure 8, from [128]) and accumulated γ-H2AX yields. Finally, the influence of the updated geometrical model and the optimal parameters that were proposed with the recent improvements in physical and chemical stages [11,133] were evaluated (see Table 4) and validated in [106] for both gamma-and proton-irradiations ('this work' in the left and right panels of Figure 8). of proteins that are involved in four major repair pathways for normal human fiblobrasts was integrated into the application. The fully integrated application was validated with both the experimental DSB yields (as Geant4-DNA 2020 in Figure 8, from [127]) and accumulated γ-H2AX yields. Finally, the influence of the updated geometrical model and the optimal parameters that were proposed with the recent improvements in physical and chemical stages [11,133] were evaluated (see Table 4) and validated in [106] for both gamma-and proton-irradiations ('this work' in the left and right panels of Figure 8).

Figure 8. Left-top:
The geometrical configuration for a human cell from Sakata et al. [127]. Right: the DSB yields as a function of LETtaken from Shin [106]. Left-bottom: the yield of γ-H2AX as a function of time after irradiation taken from [106].
The "molecularDNA" application provides a fully integrated simulation chain consisting of the physical, chemical (IRT), and biological stages of irradiation at the sub-cellular scale. This application has been fully validated through a series of Geant4-DNA investigations and the simulation results of DSB yields for a human fibroblast cell are in good agreement with experimental data. The "molecularDNA" application will soon be released in Geant4 and can be used for understanding the mechanisms leading to cellular radiobiological effects.  [128]. Right: the DSB yields as a function of LETtaken from Shin [106]. (Left-bottom): the yield of γ-H2AX as a function of time after irradiation taken from [106].
The "molecularDNA" application provides a fully integrated simulation chain consisting of the physical, chemical (IRT), and biological stages of irradiation at the sub-cellular scale. This application has been fully validated through a series of Geant4-DNA investigations and the simulation results of DSB yields for a human fibroblast cell are in good agreement with experimental data. The "molecularDNA" application will soon be released in Geant4 and can be used for understanding the mechanisms leading to cellular radiobiological effects.

Geant4-DNA Extended Examples
As discussed in the previous sections, Geant4-DNA provides functionalities for the simulation of the interactions of ionizing radiation in liquid water as well as the modeling of pre-chemical and chemical stages of water radiolysis that can be combined with simplified models of biological cellular and sub-cellular targets for damage and repair prediction. However, like Geant4, Geant4-DNA provides a set of computing libraries, and their use requires a minimum knowledge of C++ coding. Therefore, to help users understand the functionalities and develop new applications, the Geant4-DNA collaboration has developed a set of examples that are located in the "extended/medical/dna" category of the Geant4 examples that present the usage of processes and models covering from physical interactions to the chemical stage including simplified biological geometric models. A recent publication that was dedicated to track-structure simulations in liquid water medium describes these examples and their main objectives [23]. Interested readers are encouraged to consult the associated references for a detailed description of each particular application. Beyond their pedagogical role, these examples also allow the verification and validation of Geant4-DNA simulations against literature data or international recommendations as well as regular Geant4 regression tests being performed to test each new Geant4 release [23,134,135]. These examples are maintained and updated along with the Geant4 bi-annual releases. They are briefly described in the following subsections and categorized according to the stage of radiation action they serve.

Physics Examples
These Geant4-DNA examples are directly related to the main transport and energy deposition magnitudes. In each of these examples, the irradiated geometry and physics list can be modified. In many cases, condensed-history physics models can also be enabled.

•
The "clustering" example calculates the energy deposition with a dedicated clustering algorithm to assess DNA strand breaks in a simple liquid water geometry [14]; • "dnaphysics" is a general example that enables track-structure simulation of charged particles in a liquid water geometry and allows for the automatic combination between Geant4-DNA physics models and condensed-history models at higher energies (i.e., above 1 MeV) and can be used for benchmarking simulations that are related to track-structure characteristics [23]; • "icsd", that stands for ionization cluster size distribution, calculates the number of ionizations for each simulated track in a cylinder mimicking a piece of chromatin and uses DNA-like material's cross sections that were obtained experimentally or by simulations [50]; • "mfp" stands for mean free path and allows the calculation of the aforementioned distance and related distance quantities for a charged particle in a sphere geometry of liquid water [23]; • "microdosimetry" simulates lineal and specific energy distributions and related quantities in liquid water spheres that are randomly placed along the particle track [59]; • "microprox" is another microdosimetric example that calculates proximity functions from energy depositions scored in liquid water spherical shells from randomly selected hits [60]; • "range" example performs a simulation of penetration distances in liquid water [70]; • "slowing" enables simulation of the slowing down spectra of electrons in a cube of liquid water [136]; • "splitting" uses variance reduction techniques to improve the efficiency of the calculation of ionization cluster size distributions. This is done in a nm sized cylinder as in the case of the icsd example and aims to separate secondaries that are generated within the cylinder to avoid the overlapping of tracks [137]; • "spower" allows for stopping power simulations of particles in liquid water with the use of specific physics modules that enable the use of a stationary mode for appropriate computation [23]; • "svalue" calculates the dose to a target volume per unit of cumulated activity in a source volume, called S-value [138,139]. The source and target volumes can be different cell compartments or an entire cell of a simple spherical geometry which can be modified to account for more complex cell geometries, as has been done in many studies i.e. [140,141]; • "wvalue" serves to simulate the mean energy that is expended to form an ion pair known as W-value. It also provides information on the total number of ionizations in a liquid water volume and its penetration details. It is a useful benchmark simulation for the inelastic models given that elastic interactions are indifferent in this simulation scheme [23,58].

Chemical Examples
The"chemX" examples provide the guide for the chemical module from the activation of the chemical stage to the calculation of radiochemical yields ("G-value") as a funtion of time. In each example, irradiated geometry is a large water volume and as in the case of physics examples, the physics list can be modified.

•
"chem1" aims to show how to activate or deactivate physicochemical and chemical stage after physical stage. Chemical reactions are printed and the step-by-step model is used by default. • "chem2" provides a user-class "TimeStepAction" which allows users to change Minimum Time Steps. These parameters constrain the minimum time-step that is allowed for each reactant pair using the step-by-step model. The user-class also shows how to print reaction information such as reactants and products as well as their positions. • "chem3" illustrates how to implement user actions in the chemistry module using the step-by-step model. Users can also visualize the trajectories of the chemical species in time and space using the graphical user interface. • "chem4" provides scorer classes to compute radiochemical yields ("G") versus time using the step-by-step model, including a dedicated ROOT graphical interface. The G-value is useful for benchmark simulations in comparing with other MC codes and experimental data [80]. • "chem5" computes radiochemical yields ("G") versus time using alternative physics and chemical reaction lists using the step-by-step model [142]. • "chem6" computes radiochemical yields ("G") versus time and LET using the IRT model with full macro control [11,104].

The dnadamage1 Example
Following the "FullSim" simulation chain that is presented above [116], the recent Geant4-DNA example dnadamage1 method has been released since Geant4.10.6. In this example, we placed a cubic volume of 40 × 40 × 40 nm 3 at the center of a 2 × 2 × 2 µm 3 box made of liquid water (1 g/cm 3 ). Inside the 40 × 40 × 40 nm 3 cubic volume, 3640 nucleotide pairs were built to form a piece of 40 nm heterochromatin straight fiber. This geometrical DNA model was generated with the DNAFabric software. More information about the generation of this geometrical model can be found elsewhere [120]. The total number of strand breaks is computed from the combination of direct and indirect damages. Concerning the physical stage leading to direct damage, the default G4EmDNAPhysics is used. Direct damages are scored if the cumulative deposited energy from ionizations and excitations in the individual volumes of a nucleotide backbone (i.e., the volumes that are representing a group of the phosphate, the 2-deoxyribose, and the hydration shell) is greater than 17.5 eV. For the chemical stage, the G4EmDNAChemistry_option2 constructor is then used to simulate the species diffusion and their reactions with each other or with DNA elements (phosphate, the 2-deoxyribose, and base pairs) using the current SBS model [79]. A reaction between OH•radicals and static DNA elements is counted as primary damage. When one reaction happens, the radical is killed and the damaged DNA element is no longer available for further reaction. It has to be noted that simulated damage is primary damage that is transformed into a SSB with a probability of 42% [116]. More details about the C++ classes and their structure can be found through the README file of the example. To improve the computation time of the example, the IRT method is currently implemented in this example as an option [105].

Conclusions
MC track-structure codes are capable of providing both the spatial pattern of the energy deposition within a medium as well as the details of the molecular modifications that are taking place after irradiation. As of this moment, this has not been achieved experimentally. The value of such MC codes in elucidating the mechanisms of cellular damage from ionizing radiation is beyond doubt. The spatial distribution of the interaction events dictates the proximity of DNA strand breaks and, at the same time, the alterations that are taking place in the target molecule(s) determine the chemical modifications. It must always be stressed that all the results that are obtained with the MC technique are as accurate as the input information (e.g., the interaction cross sections for the physical stage of radiation action).
The Geant4-DNA low energy extension that was developed for biomedical applications is constantly evolving in terms of the development of physics models towards the accomplishment of a realistic cellular environment subject to irradiation conditions. Currently, it offers different sets of physics models, chemistry modules, and cell geometries. The combination of the above has allowed the realization of mechanistic studies of cellular DNA damage and repair. Extension of the Geant4-DNA Option 4 track-structure model for electrons up to 10 MeV is underway as well as two alternative models for protons up to 300 MeV. Furthermore, material that is specific cross sections for biopolymers [51] and for gold [143,144] that are used in nanodosimetry have already been developed and tested and they are soon going to be made available to the scientific community through Geant4-DNA. Especially, the cross sections for gold can contribute significantly to the gold nanoparticleaided radiotherapy research [145,146]. The modeling of the chemical stage is currently being improved [133] and extended to longer times and macroscopic volumes [147]. This is a requirement for easier comparison with radiolysis experiments, which also paves the way to the simulation of radiolysis in FLASH radiotherapy conditions, currently a very active research topic. A library of multi-scale geometries from molecules to assemblies of cells that are compatible with Geant4-DNA physical and chemical interfaces, will also be made available to users in the near future, and efforts to improve the computational performance of Geant4-DNA will continue. All these developments will permit a wider range of radiotherapeutic applications of the code under different irradiation scenarios, to be studied in an in silico, bottom-up approach.
The activity on DNA damage and response is intense within the Geant4-DNA framework and important studies have been published that involve mechanistic studies of DNA damage taking into account details of chemistry [11,71,104,128,133]. Such work will continue towards the goal to connect the irradiation of a cell environment to the DNA response to damage and repair. The ultimate goal is to offer the scientific community all the state-of-the-art tools to study the radiation effects in DNA and cells and to illuminate differences in RBE for low and high LET beams which is a matter of importance, not only for radiotherapy purposes, but also for space radiation risk studies and radiation protection low dose issues. It is envisioned that Geant4-DNA will offer a complete open-source platform available in Geant4 that is able to simulate physical, physicochemical, chemical,